Janitor

June 6, 2017

From the description file:

Janitor was built with beginning-to-intermediate R users in mind and is optimized for user-friendliness. Advanced users can already do everything covered here, but they can do it faster with janitor and save their thinking for more fun tasks.

The janitor functions expedite the initial data exploration and cleaning that comes with any new data set. This catalog describes the usage for each function.

You should be able to do everything inside janitor on your own, but we don’t have the time to always clean up data without help.

Benefits to using Janitor over writing your own code:

  • Functions are well tested
  • Data obeys Hadley’s official style guide
  • Generally turn many lines of code into one or two (huzzah!)
  • Pipe-able
  • Written for the education data space
library("janitor")
library("readxl")
library("dplyr")

Two main functions I use all the time:

  • clean_names()
  • get_dupes()

Other really usual functions:

  • remove_empty_rows()
  • remove_empty_cols()
  • excel_numeric_to_date()

Example

filepath <- "S:/Data Analytics/State Test Analysis/2016-2017/Uncommon Roster Prep/~Data/Source/Uncommon Roster 2016-17.xlsx"
read_excel(filepath, sheet="Sheet1", col_types = "text") %>%
  clean_names() %>%
  remove_empty_cols() %>%
  remove_empty_rows() %>%
  mutate_at(vars(entrydate, exitdate, student_id, yearsinuncommon), as.numeric) %>%
  mutate_at(vars(entrydate, exitdate), excel_numeric_to_date)
## # A tibble: 16,649 x 16
##       network school student_id      last_name first_name     grade gender
##         <chr>  <chr>      <dbl>          <chr>      <chr>     <chr>  <chr>
##  1 Collegiate    BEC  220405468         Abassy     Ernest 7th Grade      M
##  2 Collegiate    BEC  208846345   Abdus-Salaam     Saleem 8th Grade      M
##  3 Collegiate    BEC  219633948          Actie     Samach 7th Grade      M
##  4 Collegiate    BEC  242674893           Aguy    Kedrick 5th Grade      M
##  5 Collegiate    BEC  226778173         Alcide       Chaz 8th Grade      F
##  6 Collegiate    BEC  220835102   Alcindor Jr.      Erwin 7th Grade      M
##  7 Collegiate    BEC  229857347          Allen    Kirsten 6th Grade      F
##  8 Collegiate    BEC  220495568          Allen       Cody 6th Grade      M
##  9 Collegiate    BEC  214851875       Alvarado      Angel 8th Grade      M
## 10 Collegiate    BEC  223437591 Alvarado-Rivas    Valeria 8th Grade      F
## # ... with 16,639 more rows, and 9 more variables: ethnicity <chr>,
## #   lunch_status <chr>, iep_status <chr>, ell_flag <chr>,
## #   entrydate <date>, exitdate <date>, exit_explanation <chr>,
## #   yearsinuncommon <dbl>, student_count <chr>

Even more functions

  • tabyl()
  • adorn_totals("row")
  • crosstab()
  • adorn_crosstab()

Activity: Find the user guide for Janitor.